docs: KubeAIRunway Hub - Multi-Instance Portal Implementation Plan#67
docs: KubeAIRunway Hub - Multi-Instance Portal Implementation Plan#67sozercan wants to merge 2 commits into
Conversation
Add comprehensive implementation plan for extending KubeAIRunway with: - Central portal/hub for multi-instance management - OAuth authentication (Azure Entra ID + GitHub) - Portal-as-proxy architecture (users never see cluster credentials) - Azure Key Vault via Secrets Store CSI for credential storage - PostgreSQL for app data (users, roles, sessions) - Role-based access control with namespace isolation - Azure Entra group sync for automatic access mapping - Instance health dashboard with GPU capacity visibility - Credential auto-refresh on rotation
| - Auto-refresh of rotated credentials (CSI volume watch) | ||
|
|
||
| ### Future Work (noted, not implemented) | ||
| - Audit logging (all user actions to PostgreSQL) |
There was a problem hiding this comment.
IMO this should be v1 because it's a multi-tenant access proxy
| - Create `backend/src/services/oauth/` directory with provider interface | ||
| - Define `OAuthProvider` interface: `getAuthUrl()`, `exchangeCode()`, `getUserInfo()`, `refreshToken()` | ||
| - Implement Azure Entra ID provider: | ||
| - OIDC discovery (`https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration`) |
There was a problem hiding this comment.
| - OIDC discovery (`https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration`) | |
| - OIDC discovery (`https://login.microsoftonline.com/{tenant}/v2.0/.well-known/openid-configuration`) |
| - Token exchange, refresh token handling | ||
| - Extract user info + group memberships from ID token / `/me/memberOf` Graph API | ||
| - Implement GitHub provider: | ||
| - OAuth App flow (authorization code) |
| - Implement GitHub provider: | ||
| - OAuth App flow (authorization code) | ||
| - Token exchange via `https://github.com/login/oauth/access_token` | ||
| - User info from `https://api.github.com/user` |
There was a problem hiding this comment.
I think the email field will be null if the user has set their email to private. The plan should be:
- Request the
user:emailscope - Call
GET https://api.github.com/user/emailsto retrieve verified emails - Select the primary verified email as the user identifier
| - OIDC discovery (`https://login.microsoftonline.com/{tenant}/.well-known/openid-configuration`) | ||
| - Authorization code + PKCE flow | ||
| - Token exchange, refresh token handling | ||
| - Extract user info + group memberships from ID token / `/me/memberOf` Graph API |
There was a problem hiding this comment.
The overrage claim needs to be addressed
| - OAuth App flow (authorization code) | ||
| - Token exchange via `https://github.com/login/oauth/access_token` | ||
| - User info from `https://api.github.com/user` | ||
| - Org/team membership for group-based access (optional) |
There was a problem hiding this comment.
there is "groups" support for Entra users, so don't we need this for parity for GitHub users?
| - Parse kubeconfig files into usable K8s client configs | ||
| - File watcher (fs.watch) for auto-refresh when CSI driver rotates secrets | ||
| - In-memory cache of parsed credentials, invalidated on file change | ||
| - Convention: each file in the mount path = one cluster's credentials, filename = instance identifier |
There was a problem hiding this comment.
is filename = instance identifier the best approach here? this creates a tight coupling between AKV secret name and instances.name.
I think we can rely on instances.credential_ref to map to the file name explicitly rather than relying on name convention, and have reconcilation checks.
| - Create `backend/src/services/cluster-proxy.ts`: | ||
| - Accept requests with instance context (from user's session) | ||
| - Validate user has access to target instance + namespace (RBAC check) | ||
| - Forward API calls to target cluster's KubeAIRunway using stored credentials |
There was a problem hiding this comment.
Is there an allowlist of API paths? what happens if the request is /api/v1/secrets and the stored credentials ends up having broad permissions?
- Move audit logging from Future Work to v1 scope (multi-tenant proxy requirement) - Fix OIDC discovery URL to use v2.0 endpoint - Add PKCE explicitly for GitHub OAuth flow - Handle GitHub private emails via user:email scope + /user/emails API - Add Entra group overage claim handling (>150 groups fallback to Graph API) - Promote GitHub org/team sync from optional to required (parity with Entra) - Use credential_ref for CSI file mapping instead of filename convention - Add API path allowlist for cluster proxy to prevent credential over-privilege - Add audit_log table schema in Technical Notes - Update testing strategy to cover new security features Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Sertac Ozercan <sozercan@gmail.com>
Summary
This PR adds the implementation plan for KubeAIRunway Hub — a central portal for managing multiple KubeAIRunway instances across clusters with OAuth authentication.
Key Features Planned
Architecture
Implementation Phases
Future Work (noted, not in scope)
See
docs/hub-implementation-plan.mdfor the full plan.